# How to make a tsibble: (y <-tsibble(Year =2015:2019,Observation =c(123, 39, 78, 52, 110),index = Year))
# A tsibble: 5 x 2 [1Y]
Year Observation
<int> <dbl>
1 2015 123
2 2016 39
3 2017 78
4 2018 52
5 2019 110
# Out of a tibble:(y <-tibble(Year =2015:2019,Observation =c(123, 39, 78, 52, 110)) %>%as_tsibble(index = Year))
# A tsibble: 5 x 2 [1Y]
Year Observation
<int> <dbl>
1 2015 123
2 2016 39
3 2017 78
4 2018 52
5 2019 110
Using ansett tsibble, we graph only the flights from Melbourne to Sydney in economy class, we can see several drops on christmas day due to the lack of flights that day of the year, several values of 0 before 1990.
ansett
# A tsibble: 7,407 x 4 [1W]
# Key: Airports, Class [30]
Week Airports Class Passengers
<week> <chr> <chr> <dbl>
1 1989 W28 ADL-PER Business 193
2 1989 W29 ADL-PER Business 254
3 1989 W30 ADL-PER Business 185
4 1989 W31 ADL-PER Business 254
5 1989 W32 ADL-PER Business 191
6 1989 W33 ADL-PER Business 136
7 1989 W34 ADL-PER Business 0
8 1989 W35 ADL-PER Business 0
9 1989 W36 ADL-PER Business 0
10 1989 W37 ADL-PER Business 0
# … with 7,397 more rows
We can see lag 4 and lag 8 have a strong positive correlation, that is due to the seasonal pattern every year(summer)
Autocorreltaion functions (ACF)
When having all the lags from the previous topic, we can calculate the autocorreltaion of each lag, which is what we graphed in the lag plots, but now with values.
Once we have those values, we are going to be able to see the same patterns, strong negative correlation in lag 2, 6, 10, … and strong positive correlation in lag 4, 8, … That’s due to trend, seasonality and a combination of both.
Some rules:
When data have a trend, the autocorrelations for small lags tend to be large and positive.
When data are seasonal, the autocorrelations will be larger at the seasonal lags.
When data have both, you see a combination of these effects.
# A tsibble: 4 x 3 [!]
# Key: Symbol [4]
# Groups: Symbol [4]
Symbol Date Close
<chr> <date> <dbl>
1 AAPL 2018-10-03 232.
2 AMZN 2018-09-04 2040.
3 FB 2018-07-25 218.
4 GOOG 2018-07-26 1268.
3.
tute1 <- readr::read_csv("Excels/tute1.csv")
Rows: 100 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): Sales, AdBudget, GDP
date (1): Quarter
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
6 1999 Q2 Adelaide South Australia Business 200.
7 1999 Q3 Adelaide South Australia Business 169.
8 1999 Q4 Adelaide South Australia Business 134.
9 2000 Q1 Adelaide South Australia Business 154.
10 2000 Q2 Adelaide South Australia Business 169.
# … with 24,310 more rows
tsibble::tourism
# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide South Australia Business 135.
2 1998 Q2 Adelaide South Australia Business 110.
3 1998 Q3 Adelaide South Australia Business 166.
4 1998 Q4 Adelaide South Australia Business 127.
5 1999 Q1 Adelaide South Australia Business 137.
6 1999 Q2 Adelaide South Australia Business 200.
7 1999 Q3 Adelaide South Australia Business 169.
8 1999 Q4 Adelaide South Australia Business 134.
9 2000 Q1 Adelaide South Australia Business 154.
10 2000 Q2 Adelaide South Australia Business 169.
# … with 24,310 more rows
# A tsibble: 304 x 5 [1Q]
# Key: Region, State, Purpose [304]
# Groups: Region, Purpose [304]
Quarter Region Purpose Trips State
<qtr> <chr> <chr> <dbl> <chr>
1 2017 Q4 Melbourne Visiting 985. Victoria
2 2001 Q4 Sydney Business 948. New South Wales
3 2016 Q4 Sydney Visiting 921. New South Wales
4 1998 Q1 South Coast Holiday 915. New South Wales
5 2016 Q1 North Coast NSW Holiday 906. New South Wales
6 1998 Q1 Sydney Holiday 828. New South Wales
7 2017 Q4 Melbourne Holiday 806. Victoria
8 2016 Q4 Brisbane Visiting 796. Queensland
9 2002 Q1 Gold Coast Holiday 711. Queensland
10 2017 Q3 Melbourne Business 704. Victoria
# … with 294 more rows
# A tsibble: 508 x 3 [1Q]
# Key: Origin [4]
Quarter Origin Arrivals
<qtr> <chr> <int>
1 1981 Q1 Japan 14763
2 1981 Q2 Japan 9321
3 1981 Q3 Japan 10166
4 1981 Q4 Japan 19509
5 1982 Q1 Japan 17117
6 1982 Q2 Japan 10617
7 1982 Q3 Japan 11737
8 1982 Q4 Japan 20961
9 1983 Q1 Japan 20671
10 1983 Q2 Japan 12235
# … with 498 more rows
autoplot(aus_arrivals)
Plot variable not specified, automatically selected `.vars = Arrivals`
gg_season(aus_arrivals)
Plot variable not specified, automatically selected `y = Arrivals`
gg_subseries(aus_arrivals)
Plot variable not specified, automatically selected `y = Arrivals`
As we can see all 4 different origins have a seasonal pattern, some more than others, with UK being the one with more oscillation every year and US the less, this means there are certain events or weather between UK and Asutralia that makes people travel to Australia depending on the time of the year.
We can also see that Japan had an increasing trend until 1996, all the others present and slight or more visible increasing trend all along.
In the seasonal plot we can confirm all the previous observations made about seasonality and can also see when the seasonlity comes from (quarters of the year) for each country.
# A tsibble: 441 x 5 [1M]
# Key: State, Industry [1]
State Industry `Series ID` Month Turnover
<chr> <chr> <chr> <mth> <dbl>
1 Victoria Household goods retailing A3349643V 1982 Apr 173.
2 Victoria Household goods retailing A3349643V 1982 May 180.
3 Victoria Household goods retailing A3349643V 1982 Jun 167.
4 Victoria Household goods retailing A3349643V 1982 Jul 174.
5 Victoria Household goods retailing A3349643V 1982 Aug 178.
6 Victoria Household goods retailing A3349643V 1982 Sep 180.
7 Victoria Household goods retailing A3349643V 1982 Oct 190.
8 Victoria Household goods retailing A3349643V 1982 Nov 224.
9 Victoria Household goods retailing A3349643V 1982 Dec 321.
10 Victoria Household goods retailing A3349643V 1983 Jan 179.
# … with 431 more rows
autoplot(retail, Turnover)
gg_season(retail, Turnover)
gg_subseries(retail, Turnover)
gg_lag(retail, Turnover, geom ="point")
autoplot(ACF(retail, Turnover))
As we can see in our initial plot, we have Victoria’s retail turnover seperated by month, we can notice an obviuos upwards trend which is probably a little bit exponential by the end of the graph, we can also see an seasonality which is bigger as the amount of turnovers are increasing (proportional).
In the seasonal plot we can spot where the spikes of seasonality are, which are due to holidays, we can also see a more irregular spike through the years in summer break.
In the seasonal subseries plot we can confirm the proportion between spikes and amount of turnovers with december being the month with more turnovers but also, having an increase in turnovers throughout the years.
Due to the combination of strong seasonality and upwards trend, all lags between several months have a very strong autocorrelation.
We can confirm when we plot the autocorrelation function, we can see how every value is way above the range, due to strong upwards trend, and how there a few spikes every 12 months, due to seasonality.
8.
(us_employment <-filter(us_employment, Title =="Total Private"))
# A tsibble: 969 x 4 [1M]
# Key: Series_ID [1]
Month Series_ID Title Employed
<mth> <chr> <chr> <dbl>
1 1939 Jan CEU0500000001 Total Private 25338
2 1939 Feb CEU0500000001 Total Private 25447
3 1939 Mar CEU0500000001 Total Private 25833
4 1939 Apr CEU0500000001 Total Private 25801
5 1939 May CEU0500000001 Total Private 26113
6 1939 Jun CEU0500000001 Total Private 26485
7 1939 Jul CEU0500000001 Total Private 26481
8 1939 Aug CEU0500000001 Total Private 26848
9 1939 Sep CEU0500000001 Total Private 27468
10 1939 Oct CEU0500000001 Total Private 27830
# … with 959 more rows
autoplot(us_employment, Employed)
gg_season(us_employment, Employed)
gg_subseries(us_employment, Employed)
gg_lag(us_employment, Employed, geom ="point")
autoplot(ACF(us_employment, Employed))
Strong upwards trend, very small seasonality, we can confirm this in the seasonal plot, though there is an slight drop in the trend around 2010, we can see this in the seasonal subseries plot, it has strong positive correlation due to seasonality and trend, we can see how ACF is being affected by the strong trend and there is not much difference in multiples of 12 (annual) due to an small seasonality.
We can see an increasing trend in the first half, then two important drops, and after that it stayed very regular in a general perspective, has a seasonality which varies a little bit everytime. Looking at the seasonal plot we can see a very confirm the highs and lows in the second half and how seasonality stops being simple compared to the first half. Due to loss of strong trend and not very defined seasonality, the correlations and lags are not very strong. But we can also see how there is a seasonality every four-month period.
(pelt = pelt %>%as_tsibble(index = Year))
# A tsibble: 91 x 3 [1Y]
Year Hare Lynx
<dbl> <dbl> <dbl>
1 1845 19580 30090
2 1846 19600 45150
3 1847 19610 49150
4 1848 11990 39520
5 1849 28040 21230
6 1850 58000 8420
7 1851 74600 5560
8 1852 75090 5080
9 1853 88480 10170
10 1854 61280 19600
# … with 81 more rows
autoplot(pelt, Hare)
gg_lag(pelt, Hare, geom ="point")
autoplot(ACF(pelt, Hare))
This data is annual, so it’s unfair to say it has seasonality throughout the years due to the lack of information in each year. Nevertheless, we can try to analyze this data, and looking at the line plot and acf plot, we can notice a seasonality every 10 years, with a whole cycle of a drop and a spike.
`summarise()` has grouped output by 'Month'. You can override using the
`.groups` argument.
# A tsibble: 204 x 3 [1M]
# Key: ATC2 [1]
# Groups: @ Month [204]
Month ATC2 Cost
<mth> <chr> <dbl>
1 1991 Jul H02 429795
2 1991 Aug H02 400906
3 1991 Sep H02 432159
4 1991 Oct H02 492543
5 1991 Nov H02 502369
6 1991 Dec H02 602652
7 1992 Jan H02 660119
8 1992 Feb H02 336220
9 1992 Mar H02 351348
10 1992 Apr H02 379808
# … with 194 more rows
autoplot(PB, Cost)
gg_season(PB, Cost)
gg_subseries(PB, Cost)
gg_lag(PB, Cost, geom ="point")
autoplot(ACF(PB, Cost))
We can see a clear and very strong seasonlity with a big drop in January and recovery proccess the whole year until it ends in the same point. This with a general upwards trend. In the seasonal plot we can see how the drop is very clean and spontenaous, and how the whole year it recovers with very different highs and lows. We can see in ACF how the annual seasonality is very strong, and all the other correlations are very weak due to very different values throughout the year.
We can see a general upwards trend which slightly stops and changes at the end of the graphic, we can see a lot of changes in the seasonal plot due to weekly data. Strong correlations due to seasonality and trend.
9.
2A, 3D, 1B, 4C
10.
aus_live <-filter(aus_livestock, year(Month) >=1990&year(Month) <=1995, State =="Victoria", Animal =="Pigs")autoplot(aus_live, Count)
autoplot(ACF(aus_live, Count))
We can see a very obvious cahnge, when we filter the data, we see a strong upwards trend, this graph can actually be a cycle in the whole picture and life of the data instead of a general trend, We reveal this when we use all the data. This makes the range of the ACF be bigger and closer to the data in the filtered one.
Because a stock doesn’t have a continous index due to not having any data in the weekends, that’s why we change to trade number and the data can be continuous. We can see this change in the line plot. But we cannot really see a difference in ACF. c.
autoplot(dgoog, Close)
autoplot(ACF(dgoog, Close))
autoplot(dgoog1, Close)
autoplot(ACF(dgoog1, Close))
Warning: Provided data has an irregular interval, results should be treated
with caution. Computing ACF by observation.
In a general point, there is no real difference between these two, the correlations graphed are the same.